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1.  Introduction 


The  shuffle-exchange  graph  has  long  been  recognized  as  one  of  the  best 
structures  known  for  parallel  computation.  Among  its  many  applications,  a  shuffle- 
exchange  computer  can  be  used  to  compute  discrete  Fourier  transforms,  multiply 
matrices,  evaluate  polynomials,  perform  permutations  and  sort  lists  [S71,  P80,  S80]. 
The  algorithms  needed  for  these  operations  are  extremely  simple  and  many  require 
no  more  than  logarithmic  time  and  constant  space  per  processor. 

Recent  developments  in  Very  Large  Scale  Integration  (VLSI)  circuit  technology 
have  made  it  possible  to  fabricate  large  numbers  of  very  simple  processors  on  a 
single  chip.  As  most  of  the  processors  contained  in  a  shuffle-exchange  computer  are 
very  simple,  the  shuffle-exchange  graph  serves  as  an  excellent  basis  upon  which  to 
design  and  build  chip-sized  microcomputers.  One  of  the  main  difficulties  with  such 
an  architecture,  however,  is  the  problem  of  routing  the  wires  which  link  the 
processors  together  in  a  shuffle-exchange  network.  Current  fabrication  technology 
limits  the  designer  to  two  or  three  layers  of  insulated  wiring  on  a  chip  and  demands 
that  he  make  the  chip  as  small  in  area  as  possible. 

Abstracted,  the  designer’s  problem  becomes  the  mathematical  question  of  how  to 
embed  the  shuffle-exchange  graph  in  the  smallest  possible  two-dimensional  grid. 
Thompson  was  the  first  to  formalize  the  question  mathematically.  In  his  thesis 
[T80],  he  showed  that  any  layout  (i.e.,  embedding  in  a  two-dimensional  grid)  of  the 
Af-node  shuffle-exchange  graph  requires  at  least  U(N2/log2N)  area.  In  addition,  he 
described  a  layout  requiring  only  0(N2/logl/2N)  area.  Shortly  thereafter,  Hoey  and 
Leiserson  [HL80]  described  an  embedding  for  the  shuffle-exchange  graph  in  the 
complex  plane  (which  we  refer  to  as  the  complex  plane  diagram)  and  showed  how 
the  diagram  could  be  used  to  find  an  0(N2/!ogN)-area  layout  for  the  W-node 
shuffle-exchange  graph. 

In  this  paper,  we  investigate  the  algebraic  properties  of  the  complex  plane 
diagram  in  order  to  find  several  0(N2/log*/2  N)-area  layouts  for  the  /V-node  shuffle- 
exchange  graph.  In  addition  to  being  asymptotically  superior  to  previously 
discovered  layouts,  the  layouts  described  in  this  paper  are  also  superior  for  small 
values  of  N.  In  fact,  one  of  these  layouts  serves  as  the  basis  for  the  more  recent 
work  of  Leighton  and  Miller  who  have  described  optimal  layouts  for  small  shuffle- 
exchange  graphs  in  [LM81]. 


Subsequent  to  the  completion  of  the  research  presented  in  this  paper,  we  learned 
that  Rodeh  and  Steinberg  independently  discovered  an  0(N2/(ogi/2  N)-<\Krd  layout 
for  the  N- node  shuffle-exchange  graph.  Their  work  is  also  based  on  the  complex 
plane  diagram  and  appears  in  [SR81J.  Even  more  recently,  Kleitman,  Leighton, 
Lepley  and  Miller  [KLLM81]  have  discovered  an  entirely  new  method  for  laying  out 
shuffle-exchange  graphs  which  can  be  used  to  find  asymptotically  optimal 
0(N2/log2IV)-area  layouts.  Although  their  layouts  are  not  entirely  practical,  they  are 
the  only  layouts  known  to  achieve  Thompson's  lower  bound  asymptotically. 

The  remainder  of  the  paper  is  divided  into  six  sections.  In  section  2,  we  define 
the  shuffle-exchange  graph  and  the  grid  model  of  a  chip.  We  also  describe 
Thompson's  0( N2/logl/2N)-area  layout  for  the  N-node  shuffle-exchange  graph.  In 
section  3,  we  define  the  complex  plane  diagram  for  the  shuffle-exchange  graph  and 
mention  several  of  its  properties.  In  section  4,  we  describe  several  layouts  for  the 
shuffle-exchange  graph  which  are  based  on  the  complex  plane  diagram.  These 
include  a  straightforward  0(N2/logN)- area  layout  and  several  new  0( N2/log?/2N)- 
area  layouts.  Section  5  contains  some  remarks  and  open  questions,  and  sections  6 
and  7  contain  the  acknowledgements  and  references. 

2.  Preliminaries 

2a)  The  shuffle-exchange  graph 

The  shuffle-exchange  graph  comes  in  various  sizes.  In  particular,  there  is  an 
N-node  shuffle-exchange  graph  for  every  N  which  is  a  power  of  two.  Each  node  of 
the  (Af=2*)-node  shuffle-exchange  graph  is  associated  with  a  unique  Ar-bit  binary 
string  ak.j  •  •  -a0  .  Two  nodes  tv  and  tv'  are  linked  via  a  shuffle  edge  if  tv'  is  a  left 
or  right  cyclic  shift  of  tv  (i.e.,  if  tv  =  ak.r  •  • a0  and  tv'  =  ak.2>  •  or 

tv’  =  a0-  •  •ak.lal  ,  respectively).  Two  nodes  tv  and  tv'  are  linked  via  an 
exchange  edge  if  tv  and  tv'  differ  only  in  the  last  bit  (i.e.,  if  tv  =  ak_f  *  *  -  <7,(7  and 
iv*  =  ak.r  •  -a /I  or  vice-versa).  As  an  example,  we  have  drawn  the  S-node 
shuffle-exchange  graph  in  Figure  1.  Note  that  the  shuffle  edges  arc  drawn  with 
solid  lines  while  the  exchange  edges  are  drawn  with  dashed  lines.  We  shall  follow 
this  convention  throughout  the  paper. 
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Figure  1:  The  8-node  shuffle-exchange  graph 


By  replacing  the  nodes  and  edges  of  the  shuffle-exchange  graph  by  processors 
and  wires  (respectively),  the  shuffle-exchange  graph  can  be  transformed  into  a  very 
powerful  parallel  computer  (which  we  call  the  shuffle-exchange  computer).  The 
computational  power  of  the  shuffle-exchange  computer  is  partly  derived  from  the 
fact  that  every  pair  of  nodes  in  an  A/-node  shuffle-exchange  graph  is  linked  by  a 
path  containing  at  most  2logN  edges  and  thus  the  communication  time  between 
any  pair  of  processors  is  short. 

More  importantly,  however,  the  shuffle-exchange  computer  is  capable  of 
performing  a  perfect  shuffle  on  a  set  of  data  in  a  single  parallel  operation.  For 
example,  consider  a  deck  of  8  cards  distributed  among  the  8  processors  of  the  8- 
node  shuffle-exchange  graph  so  that  processor  000  initially  has  card  0,  processor 
001  initially  has  card  1,  processor  010  initially  has  card  2,  and  so  forth.  Next, 
consider  a  (parallel)  operation  of  the  shuffle-exchange  computer  in  which  each 
processor  sends  its  card  across  a  shuffle  edge  to  the  neighboring  processor 

ala(fa2  •  It  is  easily  verified  that,  after  completion  of  the  operation,  processor  000 
contains  card  0  (the  top  card  in  the  shuffled  deck),  processor  001  contains  card  4 
(the  second  card  in  the  shuffled  deck),  and  so  forth. 

The  power  of  card  shuffling  and  its  mathematical  abstractions  is  well  known  to 
magicians  and  mathematicians  [DGK811  as  well  as  to  computer  scientists  [S71, 
S80].  For  a  good  survey  of  the  computational  power  of  the  shuffle-exchange 
graph,  we  recommend  Schwartz'  paper  on  ultracomputers  [S80].  In  addition. 
Stone’s  paper  [S71]  contains  a  nice  description  of  some  important  parallel 
algorithms  based  on  the  shuffle-exchange  graph. 
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2b)  The  grid  model 

Among  the  many  mathematical  models  that  have  been  proposed  for  VLSI 
computation,  the  most  widely  accepted  is  due  to  Thompson  and  is  known  as  the 
Thompson  grid  model  [T79,  T80].  The  grid  model  of  a  VLSI  chip  is  quite  simple. 
The  chip  is  presumed  to  consist  of  a  grid  of  vertical  and  horizontal  tracks  which 
are  spaced  apart  by  unit  intervals.  Processors  are  viewed  as  points  and  are  located 
only  at  the  intersection  of  grid  tracks.  Wires  are  routed  through  the  tracks  in  order 
to  connect  pairs  of  processors.  Although  a  wire  in  a  horizontal  track  is  allowed  to 
cross  a  wire  in  a  vertical  track  (without  making  an  electrical  connection),  pairs  of 
wires  are  not  allowed  to  overlap  for  any  distance  or  to  overlap  at  corners  (i.e.,  in 
they  cannot  overlap  in  the  same  track).  Further,  wires  are  not  allowed  to  overlap 
processors  to  which  they  are  not  linked.  (The  routing  of  wires  in  this  fashion  is 
also  known  as  layer  per  direction  routing  and  Manhattan  routing .) 

As  an  example,  we  have  included  a  grid  layout  for  the  8-node  shuffle-exchange 
graph  in  Figure  2.  As  before,  the  shuffle  edges  are  drawn  with  solid  lines  while  the 
exchange  edges  are  drawn  with  dashed  lines.  Notice  that  we  have  omitted  the  self¬ 
loops  in  Figure  2  since  they  are  electrically  redundant.  In  general,  the  processors 
need  not  all  be  placed  on  a  single  horizontal  line  (as  they  are  in  this  example). 


Figure  2:  A  grid  model  layout  of  the  8-node  shuffle-exchange  graph . 


Practical  considerations  dictate  that  the  area  of  a  VLSI  layout  be  as  small  as 
possible.  The  area  of  a  layout  in  the  grid  model  is  defined  to  be  the  product  of  the 
number  of  horizontal  tracks  and  the  number  of  vertical  tracks  which  contain  a 
processor  or  wire  segment  of  the  layout.  For  example,  the  layout  in  Figure  2  has 
area  48.  As  can  be  easily  observed,  this  is  far  from  optimal. 
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2c)  Thompsons  layout 


Given  any  Ar-bit  string  w,  define  the  size  of  w  to  be  the  number  of  /-bits  it 
contains.  For  example,  the  size  of  10110  is  3.  Thompson’s  idea  was  to  lay  out  the 
N=2k  nodes  of  the  shuffle-exchange  graph  on  a  straight  line  in  order  of 
nondecrcasing  size.  It  is  easily  seen  that  shuffle  edges  link  nodes  which  have  the 
same  size  and  that  exchange  edges  link  nodes  which  have  sizes  differing  by  one. 
Thus  the  edges  of  such  a  layout  are  relatively  short.  In  fact,  nodes  connected  by 
shuffle  edges  can  be  placed  in  a  group,  so  that  only  2  horizontal  tracks  are  used  for 
all  the  shuffle  connections.  The  remaining  horizontal  tracks  arc  occupied  by 
exchange  edges. 

flie  exchange  edges  are  inserted  from  left  to  right  so  that  each  exchange  edge 
occupies  two  vertical  tracks  and  a  portion  of  the  lowest  horizontal  track  which  is 
empty  at  the  time  of  its  insertion.  (For  example.  Figure  2  displays  a  layout  for  the 
8-node  shuffle-exchange  designed  in  this  way.)  This  well-known  strategy  for 
inserting  exchange  edges  guarantees  that  the  number  of  horizontal  tracks  used  will 
be  minimal,  and  equal  to  the  maximum  number  of  edges  which  must  (at  some 
fixed  point)  overlap  one  another.  Since  exchange  edges  link  nodes  which  differ  in 
size  by  one,  it  is  easily  seen  that  the  maximum  overlap  is  at  most  Oimax^  B ) 
where  Bs  is  the  number  of  nodes  of  size  s. 

It  is  easy  to  show  that  Bs  =  C(k,s)  for  each  s,  where 

C(k,s)  =  k!/[s!(k-s)l] 

is  the  well-known  function  for  binomial  coefficients.  It  is  also  well-known  that 
C(Ar,5)  achieves  its  maximum  value  at  s=k/2  for  any  k.  Using  standard  asymptotic 
analysis,  it  is  easily  shown  that  C(k,k/2)  ~  (2/‘ir)l/2(2k/k,/2)  for  large  k.  (For  a 
good  review  of  such  techniques,  see  Bender  and  Orszag’s  book  [B078].)  Thus 
Thompson’s  layout  requires  only  0(N/log,/2N)  horizontal  tracks.  Since  only  1  or 
2  vertical  tracks  are  needed  to  embed  the  vertical  portions  of  the  edges  incident  to 
any  given  node,  we  can  conclude  that  Thompson's  layout  has  area  0(N2/logl/2N). 

3.  The  Complex  Plane  Diagram 

In  [HL80],  Hoey  and  Lciserson  observed  that  there  is  a  very  natural  embedding 
of  the  shuffle-exchange  graph  in  the  complex  plane.  In  what  follows,  we  describe 


this  embedding  (which  we  call  the  complex  plai.e  diagram )  and  point  out  some  of 
its  more  important  properties. 
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3a)  Definition 

Let  8k  =  e2'ni/k  denote  the  kth  primitive  root  of  unity.  Given  any  A:-bit  binary 
string  w  =  a^.j  •  •  •  a0  ,  let  p(w)  be  the  map  which  sends  w  to  the  point 

M  =  ‘*k-i*kk~I  +  •"  +  a/sk  +  ao 

in  the  complex  plane.  As  each  node  of  the  (Ar=2^)-node  shuffle-exchange  graph 
corresponds  to  a  A-bit  binary  string,  it  is  possible  to  use  the  map  to  embed  the 
shuffle-exchange  graph  in  the  complex  plane.  For  example,  we  have  done  this  for 
the  32-node  shuffle-exchange  graph  (whence  A  =  5)  in  Figure  3.  For  simplicity, 
each  node  is  labeled  with  its  value  instead  of  its  5-bit  binary  string.  (By  the  value 
of  a  node,  we  mean  the  numerical  value  of  the  associated  A-bit  binary  string.) 


Figure  3:  The  complex  plane  diagram  for  the  32-node 
shuffle-exchange  graph.  ( Taken  from  [H  L80].) 
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3b)  Properties 

Examination  of  Figure  3  indicates  that  the  complex  plane  diagram  has  some 
very  interesting  properties.  First,  it  is  apparent  that  the  shuffle  edges  occur  in 
cycles  (which  we  call  necklaces )  which  are  symmetrically  placed  about  the  origin. 
This  phenomenon  is  easily  explained  by  the  following  identity: 

&k  ^ak-l  •  •  •  o0)  =  ok.t8kk  +  ak_2Skk-1  +  aj8k2  +  a08k 
=  ak-2&kkl  +  •"  +  a08k  +  ak.j 
=  P(ak-2---a(flk-i)- 

Thus  traversal  of  a  shuffle  edge  corresponds  to  a  2Tr/k  rotation  in  the  complex 
plane. 

Except  for  degenerate  cases,  the  preceding  identity  also  indicates  that  each 
necklace  is  composed  of  k  nodes,  each  a  cyclic  shift  of  the  other.  Such  necklaces 
are  called  full  necklaces.  Degenerate  necklaces  contain  fewer  than  k  nodes  and, 
because  they  must  have  some  symmetry,  are  mapped  entirely  to  the  origin  of  the 
complex  plane  diagram.  For  example,  {00000}  and  {0101,  10/0}  are  degenerate 
necklaces  while  both  {101,  ON,  1/0 }  and  {l  1100,  11001,  10011,  00111,  OHIO }  are 
full.  As  we  note  in  the  following  proposition,  the  number  of  degenerate  necklaces 
is  quite  small  compared  to  the  number  of  full  necklaces. 

Proposition  t:  There  are  0 (N,/2)  degenerate  necklaces  and  N/logN  - 

O (N,/2/logN)  full  necklaces  in  the  N-node  shuffle- exchange  graph. 

Proof:  A  node  iv  is  in  a  degenerate  necklace  if  its  binary  representation  has  a 
nontrivial  symmetry  with  respect  to  cyclic  shifts.  Without  loss  of  generality,  such  a 
string  of  bits  must  consist  of  a  block  of  k/p  bits  which  is  repeated  p  times  where  p 
is  some  prime  divisor  of  k.  As  there  are  2 k/P  binary  strings  of  length  k/p ,  this 
means  that  the  number  of  nodes  in  degenerate  necklaces  is  at  most 

pi* 

22^  <  O (N,/2). 

pSl 

The  remaining  N  -  0(NI/2)  nodes  are  in  full  necklaces.  As  each  full  necklace 
contains  logN  nodes,  there  are  N/logN  -  Q(Nl/2/logN)  Tull  necklaces  □ 

It  will  often  be  convenient  to  refer  to  a  necklace  by  one  of  its  nodes.  In 


7 


particular,  we  will  use  the  notation  <w>  to  indicate  the  necklace  generated  by  w. 
This  is  simply  the  collection  of  cyclic  shifts  of  w.  For  example,  the  necklace 
generated  by  101  is  </0/>  =  {101,  Oil,  110}  . 

Exchange  edges  are  also  embedded  in  a  very  regular  fashion  by  the  complex 
plane  diagram.  In  fact,  each  exchange  edge  is  embedded  as  a  horizontal  line 
segment  of  unit  length.  This  phenomenon  is  explained  by  the  identity 

lKak-i  •  •  •  1  =  l  ~t~  Q /fi  k  1 

=  Aak-l  •  •  •  a/l)  > 

In  some  cases,  several  exchange  edges  are  contained  in  the  same  horizontal  line 
of  the  diagram.  Such  lines  are  called  levels.  For  example,  there  are  9  levels  in  the 
diagram  of  the  J2-node  shuffle-exchange  graph  shown  in  Figure  3.  We  will  use 
the  properties  of  levels  to  Find  0{N2/log?/2N)-Mea  layouts  for  the  /V-node  shuffle- 
exchange  graph. 

4.  Layouts  Based  on  the  Complex  Plane  Diagram 

In  this  section,  we  present  several  layouts  of  the  shuffle-exchange  graph  which 
are  based  on  the  complex  plane  diagram.  We  commence  with  a  straightforward 
0(N2/logN)-i\rca  layout  of  the  /V-node  shuffle-exchange  graph.  This  layout  has 
been  discovered  by  many  researchers  (including  Hoey  and  Leiserson).  Later,  we 
show  how  the  layout  can  be  modified  so  as  to  require  only  O (N2/lo^/2N)  area. 

4a)  A  straightforward  0( N2/logN)- a rca  layout 

In  what  follows,  we  describe  a  straightforward  layout  of  the  shuffle-exchange 
graph  which  requires  only  O (N2/logN)  area.  The  layout  is  formed  from  a  grid  of 
levels  and  necklaces  which  we  refer  to  as  the  level- neck  lace  grid.  Each  row  of  the 
grid  corresponds  to  a  level  of  the  complex  plane  diagram.  The  columns  of  the  grid 
are  divided  into  consecutive  column  pairs,  each  pair  corresponding  to  a  necklace. 
The  leftmost  column  of  each  column  pair  corresponds  to  that  part  of  the  necklace 
which  is  contained  in  the  left  half  of  the  complex  plane.  Similarly,  the  rightmost 
column  of  each  pair  corresponds  to  the  part  of  the  necklace  contained  in  the  right 
half  of  the  complex  plane. 


The  rows  of  the  level-necklace  grid  must  have  the  same  top-to-bottom  order  as 
do  the  corresponding  levels  in  the  complex  plane  diagram.  The  columns,  however, 
may  be  arranged  arbitrarily  (provided  that  columns  corresponding  to  the  same 
necklace  are  adjacent  in  the  grid). 

Each  node  of  the  shuffle-exchange  graph  is  placed  at  the  intersection  of  the  row 
and  column  of  the  grid  which  correspond  to  the  level  and  part  of  the  necklace  (left 
half  or  right  half)  to  which  it  belongs  in  the  complex  plane  diagram.  For  example, 
we  have  done  this  for  a  random  ordering  of  the  necklaces  of  the  J2-node  shuffle- 
exchange  graph  in  Figure  4.  (Notice  that  we  have  used  just  one  column  each  for 
the  degenerate  necklaces  <0>  and  <31>  since  they  each  contain  just  one  node.  In 
general  two  columns  will  be  required  for  necklaces  which  are  mapped  to  the  origin 
of  the  complex  plane  diagram,  but  the  nodes  of  each  such  necklace  should  still  be 
lumped  togther  at  a  single  point  of  the  level-necklace  grid.) 
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Figure  4:  A  level- necklace  grid  for  the  32-node  shuffle- exchange  graph. 


Given  a  level-necklace  grid  for  a  shuffle-exchange  graph,  it  is  not  difficult  to 
produce  a  layout  for  the  graph.  The  main  step  is  to  partition  the  exchange  edges  in 
each  row  of  the  grid  into  nonoverlapping  subsets.  Each  subset  can  then  be 
assigned  to  a  horizontal  track  of  the  layout.  Except  for  the  row  corresponding  to 
the  real  line  in  the  complex  plane  diagram,  the  assignment  of  subsets  to  horizontal 
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tracks  within  a  row  is  arbitrary.  (The  assignment  of  horizontal  tracks  containing 
nodes  on  the  real  line  must  preserve  the  cyclic  orientation  of  the  nodes  which  are 
in  necklaces  that  are  mapped  to  the  origin.) 

Once  this  is  done,  the  exchange  edges  can  be  inserted  in  the  horizontal  tracks 
and  the  shuffle  edges  can  be  inserted  in  the  vertical  tracks.  (To  be  precise,  some  of 
the  shuffle  edges  also  occupy  part  of  a  horizontal  track  at  the  top  or  bottom  of  the 
layout.)  By  Proposition  1,  the  number  of  vertical  tracks  occupied  by  the  necklaces 
is  at  most  2N/logN  +  0(N,/2).  Since  there  are  precisely  N/2  exchange  edges,  at 
most  N/2  +  2  horizontal  tracks  are  contained  in  the  layout.  Thus  the  total  area 
of  the  layout  of  the  AHiode  shuffle-exchange  graph  is  at  most  N2/logN+  0(NJ/2) . 
As  an  example,  we  have  displayed  in  Figure  5  a  layout  of  the  12-node  shuffle- 
exchange  graph  produced  from  the  level-necklace  grid  in  Figure  4. 

necklaces 

<3?-  <7>  <L1>  <1>  <5>  <0><15>  <31> 


23 

— -* 

31 


27 


Figure  5:  Layout  of  the  3  2- node  shuffle-exchange  graph 
produced  from  the  level-necklace  grid  shown  in  Figure  4. 


4b)  An  improved  0(N^/log ^N) -area  layout 

It  is  possible  to  improve  the  layout  described  in  section  4a  by  reducing  the 
number  of  horizontal  tracks  needed  to  embed  the  exchange  edges.  This  can  be 
done  by  reordering  the  necklaces  from  left  to  right  so  as  to  increase  the  average 
number  of  exchange  edges  which  can  be  inserted  on  each  horizontal  track.  For 
example,  the  ordering  of  the  necklaces  shown  in  Figure  6  results  in  far  fewer 
horizontal  tracks  being  used  than  did  the  ordering  of  necklaces  shown  in  Figure  5. 
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Figure  6:  An  improved  layout  for  the  32- node  shuffle- exchange  graph. 


Although  we  do  not  know  how  to  best  order  the  necklaces  in  general,  we  have 
found  several  orderings  which  yield  0( N2/log^/2N)-atetx  layouts  for  the  Af-node 
shuffle-exchange  graph.  For  instance,  we  will  show  in  what  follows  that  such  a 
layout  can  be  constructed  by  arranging  the  necklaces  from  left  to  right  in  order  of 
nondecreasing  size.  (The  size  of  a  necklace  is  simply  defined  to  be  the  size  of  any 
of  its  nodes.)  As  an  example,  the  layout  displayed  in  Figure  6  is  of  this  form. 
(This  observation  has  also  been  made  by  Steinberg  and  Rodeh  in  [SR81].) 

In  order  to  bound  the  number  of  horizontal  tracks  needed  to  insert  the  exchange 
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edges,  we  will  show  that  the  maximum  overlap  of  exchange  edges  on  each  level  is 
at  most  the  number  of  nodes  of  size  h  =  ]^k-l)/2 J  on  that  level.  Since  the 
maximum  overlap  of  exchange  edges  on  each  level  is  an  upper  bound  on  the 
number  of  horizontal  tracks  needed  to  insert  the  exchange  edges  on  that  level,  we 
can  thus  conclude  that  the  total  number  of  horizontal  tracks  needed  to  insert  all  of 
the  exchange  edges  is  at  most 

Bh  <  =  (2/ir),/2  N/logl/2N  +  0{N/log*/2N) . 

Thus  the  resulting  layout  will  have  area  at  most 

.  2(  2/  7T ) 1/2  N2/ log2''2  N  +  0(  N2/lo^/2N). 

Although  it  is  clear  that  the  maximum  toial  overlap  (over  all  levels)  of  exchange 
edges  is  at  most  B^,  this  is  not  sufficient  to  prove  the  result  since  any  layout 
must  also  preserve  the  top-to-bottom  partial  order  induced  by  the  necklace 
structure  on  the  exchange  edges.  It  is  only  within  individual  levels  that  the  top-to- 
bottom  ordering  of  exchange  edges  is  arbitrary.  (As  we  noted  earlier,  some  minor 
precautions  are  necessary  for  the  level  corresponding  to  the  real  line.)  It  is  not 
immediately  clear,  however,  why  the  maximum  overlap  on  each  level  is  at  most  the 
number  of  nodes  of  size  h<k/2  on  that  level.  Jn  what  follows,  we  establish  this 
result  by  breaking  up  each  level  into  sublevels  (for  which  the  analysis  is  easier)  and 
showing  that  the  maximum  overlap  on  each  sublevel  is  at  most  the  number  of 
nodes  of  size  h  on  that  sublevel;  The  analysis  requires  some  additional  notation. 

Consider  a  node  of  the  form  for  which  either  ak.;=0  or  a^O  or 

both  for  each  i<k.  We  will  refer  to  such  a  node  as  basis  node.  A  node 
bk-r  '  ‘bo  *s  sa'd  to  be  generated  by  the  basis  node  ak.t-  •  -a0  if 

1)  b^. ^a^j  and  bi=ai  whenever  ak.j*aj  for  1  <  /<  k-1 ,  and 

2)  bk.i=bi  whenever  a^^-a—O  for  1  <i  <.  k-1 . 

For  example,  10000  generates  10001,  11100  and  11101  but  not  111/1. 

It  is  not  difficult  to  show  that  if  u  generates  v,  then  both  u  and  v  are  on  the  same 
level  of  the  complex  plane  diagram.  For  example,  let  u  =  ak./--  • a0  and 
v  =  bk-l‘ '  ‘b0  an£*  observe  that 

/<v)  -  p(u)  =  (bk.j  -  ak.j)  8kk •'  +  --  -  +  (bt-a,)8k  +  ( b0  -  a0) 

~  ck-l8kk'!  +  •"  +  cf8k  +  c0 


12 


where  ck.i=ci  for  each  4  /  <  /  <  k-l  .  Since  8^''  is  the  complex  conjugate  of 
8 k‘  for  I  <  i  <  k-l ,  we  can  conclude  that  p(v)  -  p(u)  is  a  real  number  and  thus 
that  u  and  v  are  in  the  same  level  of  the  complex  plane  diagram. 

It  is  also  easy  to  show  that  each  node  of  the  shuffle-exchange  graph  is  generated 
by  a  unique  basis  node.  In  particular,  the  node  which  generates  bk.t>  •  •  b0  can 
be  found  by 

1)  setting  b0=0  and  (if  k  is  even)  setting  b^^O,  and 

2)  setting  b^b^-0  for  each  /  such  that  (originally)  b—b^—l. 

Since  exchange  edges  link  nodes  which  have  the  same  basis  node,  we  can 
conclude  from  the  preceding  arguments  that  it  is  possible  to  partition  each  level  of 
the  complex  plane  diagram  into  sublevels  so  that  the  nodes  in  each  sublevel  are 
precisely  the  nodes  generated  by  some  basis  node.  We  will  now  show  that  the 
maximum  overlap  on  each  sublevel  is  at  most  the  number  of  nodes  of  size  h  on 
that  sublevel. 

Since  the  necklaces  have  been  arranged  from  left  to  right  in  order  of 
nondecreasing  size,  the  overlap  of  exchange  edges  between  two  nodes  of  size  s  in 
any  sublevel  is  at  most  0(  max  B  ')  where  B'  is  the  number  of  nodes  in  that 

sublevel  with  size  5.  In  the  following  proposition,  we  compute  Bs'  and  show  that 

its  maximum  for  any  sublevel .  occurs  at  s=h. 

Proposition  2:  Each  basis  node  of  size  r  generates  Bf  nodes  of  size  s,  where 

1)  Bf  =  C(h  -  r,  i)  for  s=r+  2i  and  i<h-r,  and 

2)  Bf  =  C(h  -  r,  l)  for  s=r+  2i+  /  and  i<  h-  r 

when  k  is  odd,  and 

1)  Bf  =  C(/t  -  r+ 1,  i)  for  s=  r+  2i  and  i^h-r+l ,  and 

2)  Bf  =  2C(h  -  r,  l)  for  s=r+  2i+  /  and  i<  h-  r 
when  k  is  even. 

Proof:  When  k  is  odd,  there  are  precisely  h  -  r  pairs  Oj  =  ak.j  =  0  in  a  basis 
node  of  size  r.  In  order  to  generate  a  string  of  size  s=r+2i  when  k  is  odd,  we 
must  set  b0=O  and  set  /  of  the  h-r  pairs  so  that  bj  =  bk.j  =  /.  There  are  C (h  -  r,  i ) 
such  strings.  To  generate  a  string  of  size  s=r+  2i+  /  when  k  is  odd,  we  must  set 
b0=l  and  choose  /  of  the  h-r  pairs  so  that  bj  -bk.j  =/.  As  before,  there  are 
C (//  -  r,  i)  such  strings. 


When  k  is  even,  there  is  also  the  degenerate  pair  a ^  =0.  To  generate  a  string 
of  size  s-  r+2i  when  k  is  even,  we  must  choose  i  of  the  h  -  r+  /  pairs  so  that  bj 
=  bk.j  =  /  (this  count  includes  the  "pair"  b0  =  bk/2  =  /)•  There  are  C (h  -  r+  /,  /) 
such  strings.  To  generate  a  string  of  size  s=r+2i+ 1  when  k  is  even,  we  must  set 
either  b0=I  and  bk/2=0  or  b0=0  and  bk/2~  /,  and  choose  /  of  the  hr  pairs  so 
that  bj  =bk.j  =1  (j  *  k/2).  There  are  2C(h  -  r,  /)  such  strings  □ 

Given  Proposition  2,  it  is  easily  checked  that  the  maximum  value  of  Bs'  for  any 
sublevel  (independent  of  the  value  of  r)  occurs  when  s=h.  Thus  the  sum  (over  all 
sublevels)  of  the  maximum  overlap  at  each  sublevel  is  at  most  the  number  of  nodes 
of  size  h  =  in  the  entire  graph.  This  is  at  most  C(k,  k/2)  ~ 

(2/n),/2(2k/kI/2).  Thus  the  total  area  of  the  layout  is  no  more  than 

2(2/ir)l/2N2/log3/2N  +  0(N2/log>/2N), 

as  claimed. 

4c)  Additional  0(N2  /  log222  N)- area  layouts 

By  varying  the  order  of  the  necklaces  in  the  level-necklace  grid,  it  is  possible  to 
produce  a  variety  of  layouts  for  the  shuffle-exchange  graph  which  require  at  most 
0(N2/logv2N)  area.  The  complex  plane  diagram  itself  suggests  one  such  ordering. 
For  example,  consider  an  arrangement  of  the  necklaces  from  left  to  right  in  order 
of  nondecreasing  radius.  (The  radius  of  a  necklace  is  defined  to  be  the  distance  of 
its  nodes  from  the  origin  in  the  complex  plane  diagram.)  Such  a  layout 
corresponds  to  a  folding  of  the  complex  plane  diagram  along  its  imaginary  axis 
followed  by  a  straightening  of  the  necklaces.  In  what  follows,  we  will  show  that, 
like  a  layout  by  necklace  size,  a  layout  by  necklace  radius  has  area  0(N2/log3/2 N). 

Because  the  layout  by  radius  is  so  closely  related  to  the  complex  plane  diagram, 
our  analysis  will  center  on  the  complex  plane  diagram,  itself.  As  before,  we  will 
partition  the  levels  into  sublevels  and  find  an  upper  bound  on  the  maximum 
overlap  of  exchange  edges  on  each  sublevel  separately.  The  number  of  horizontal 
tracks  needed  to  insert  the  exchange  edges  will  then  be  at  most  the  sum  of  these 
upper  bounds.  We  will  show  that  this  sum  is  at  most  0(N/log,/2N). 

Notice  that  the  maximum  overlap  of  exchange  edges  on  a  sublevel  of  the  level- 
necklace  grid  is  at  most  twice  the  maximum  overlap  on  that  sublevel  in  the 
complex  plane  diagram.  (The  factor  of  two  is  introduced  by  the  "folding"  of  the 
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diagram  along  its  imaginary  axis.  Although  straightening  the  necklaces  might 
affect  the  maximum  total  overlap  of  exchange  edges,  it  does  not  affect  the  overlap 
within  a  sublevel.) 


Within  a  sublevel,  an  exchange  edge  can  be  identified  by  the  real  part  of  its 
midpoint.  For  example,  the  real  part  of  the  midpoint  of  exchange  edge 
(6 k-i  •  •  'b/0,  bk.j. . .  b fl)  is 

bk./  cos[2m(k-l)/k\  +  ...  +  bjCOs[2it/k ]  +  1/2 . 


If  a  is  a  basis  element  of  a  sublevel,  then  a  generates  the  other  nodes  in  that 
sublevel  by  substitution  of  the  appropriate  pairs  of  ones.  For  instance,  we  may  set 
bj=b/c.j=I,  if  a— a %.;=().  Let 

Ta  =  {  1  <j  <  h  |  aj  =  ak.j  =  0  } 


denote  those  indices  1  </  <h  where  a  pair  of  /-bits  may  be  substituted  for  a  pair 
of  O  bits.  (As  before,  h  —  ((Ar  -  /)/2j  but  for  convenience,  we  shall  henceforth 
assume  that  k  is  odd.)  Notice  that  if  b  is  generated  by  a ,  then  the  real  part  of  the 
midpoint  of  the  exchange  edge  incident  to  b  is 

2  2b.  cos  (2iri/k)  +  2  cos(2ni/k)  +  1/2 

nii  H 


We  now  introduce  a  random  variable  Za,  which  has  as  its  image,  all  of  the  real 
parts  of  the  midpoints  of  edges  in  the  sublevel  generated  by  a.  Since  bi=bk.i  can 
be  either  0  or  1  when  i  €  Ta,  let  Bi  be  a  random  variable  representing  this  choice. 
In  particular. 


Then 


Bi  =  0  with  probability  1/2,  and 
Bj  =  1  with  probability  1/2. 

Za  —  2  2  cos  {2 Tii/k)  Bj  +  2  cos  (2<ni/k)  +  1/2 

tilth 

4 

=  '22cos(2<ni/k)  (Br  1/2). 


Since  the  exchange  edges  have  unit  length  in  the  complex  plane  diagram,  two 
edges  overlap  if  and  only  if  their  midpoints  are  within  unit  distance  of  each  other. 
Ilius  the  number  of  edges  which  overlap  at  position  x  on  the  sublevel  generated 
by  a  node  a  is  given  by  the  formula 


1 
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217*1  Prob[x - 1/2  <Za<  x+ 1/2)  , 

where  |TJ  denotes  the  cardinality  of  Ta.  (We  caution  the  reader  that  the  notation 
|x|  is  also  used  to  denote  the  absolute  value  of  x.) 

Although  the  distribution  function  of  Za  is  difficult  to  analyze  directly,  it  does 
behave  like  a  normal  distribution.  This  is  because  Zfl  is  the  sum  of  independent 
random  variables  which  have  mean  0and  variance  a  2  =  cos\2mi/k).  The  Berry- 
Esseen  Theorem  states  precisely  how  far  Za  can  vary  from  a  normal  distribution. 
(For  a  proof  of  this  theorem  see  [F71].) 

Berry-Esseen  Theorem:  Let  X/t  X2, ...  ,  Xm  be  independent  random 
variables  such  that  E{X)  =  0,  E (Xj2)  =  of,  and  E(| A'^l)  =  p-  for  l<.i<m. 
Set  s2  =  <r72+  •  •  •  +om2  and  r  =  py  +  •  •  •  +  pm.  In  addition,  let  F  denote 
the  cumulative  distribution  function  of  the  sum  (X{  +  •  •  •  +Xnl)/s.  Then  for  all  x, 

\F{x)  -  <K*)|  <  6r/s* 

where  is  the  standard  normal  cumulative  distribution  function  □ 

In  the  case  of  a  sublevel  generated  by  a  node  a,  we  have 

Xt  =  2 cos(2ir i/k)  (Br  1/2)  for  id  Ta, 
ieT*. 

sa2  =  2  cos2  (2n  i/k)  and 

i  6  7a. 

ra  =  Si  cos3 {2m i/k) \ . 

Applying  the  Berry-Esseen  Theorem,  we  can  thus  conclude  that 

Prob  [  x  - 1/2  <  Za  <  x + 1/2  ]  =  Prob  [  (x  - 1/2)/ sa  <  ZJsQ  <  (x+ 1/2)/ sa  ] 

<  4>[(x+//2KsJ  -  «K[(x-//2VjJ  +  12rJsJ 

Because  the  standard  normal  density  function  is  symmetric  and  unimodal,  we  can 
conclude  that  the  maximum  of  Prob  [  x  - 1/2  <  Za<  x-(- 1/2  ]  occurs  at  x  =  0 
and  is  at  most  0{I/sa  +  rjsf). 

In  the  following  proposition,  we  fina  bounds  for  the  values  of  ra  and  sa. 


Proposition  3:  For  any  basis  node  a 

Utl 

ra  =  2  \  cos3  {2<n  i/k)  |  <  \Ta\  and 

Uz, 

s2  =  '2lcos2(2TTi/k)  >  $2(1  Tf/k2). 

Proof:  The  bound  on  ra  is  easy  to  compute  since  \cos3{2-ni/k)\  <  I .  The 
calculation  of  sa  is  a  bit  more  tedious.  In  order  to  obtain  a  lower  bound, 
cos2(2iri/k )  must  be  made  as  small  as  possible.  The  smallest  values  occur  when 
Ta  contains  indices  /  which  are  as  close  to  ( k-/)/4  as  possible.  In  this  case,  we  can 
approximate  cos\2iri/k)  with  the  value  cftt/2  -  2-n i/k)2 ,  for  some  constant  c. 
Direct  computation  reveals  that  the  sum  of  these  squares  is  at  least  SldTJ^/k2)  □ 

Since  \Ta\  <  k  for  all  a,  we  can  conclude  from  the  preceding  that  the  maximum 
overlap  of  exchange  edges  on  a  sublevel  generated  by  a  is  at  most 

0(2^  k3/\Ta\7/2). 

Noting  that  there  are  precisely  C(h,j )  2h'2  sublevels  generated  by  a  node  for 
which  |TJ  =  j  and  summing,  we  can  conclude  that  the  total  number  of  horizontal 
tracks  needed  to  insert  all  of  the  exchange  edges  is  at  most 

4 

2  C(h  ,J)  2h'J  0(2>  k3  / j//2) 

=  O  [k32h  2  c(h,j)/j7/2  ]. 

It  is  not  difficult  to  check  that  the  dominant  terms  in  the  preceding  sum  occur 
when  j=  h/2  ±  O(hl/2iogh).  In  this  region,  j  -  9(A)  and  thus  the  sum  is 
bounded  above  by 

h 

O  ( 2h  k'l/2  2  C(h  ,j)  1  =  0{2k  l/kI/2) 

jrl 

=  0{N/logI/2N) , 

thus  completing  the  proof  that  a  layout  by  necklace  radius  takes  at  most 
0(N2/log2/2N)  area. 


5.  Remarks 


It  is  worth  remarking  that  the  0(N2/log3/2N)-aresi  layouts  for  the  shuffle- 
exchange  graph  described  in  section  4  actually  require  Sl(N2/log?/2  N)  area  and 
thus  our  analysis  of  these  layouts  cannot  be  improved  by  more  than  a  constant 
factor.  In  each  case,  the  lower  bound  on  area  can  be  derived  from  the  fact  that  the 
maximum  total  overlap  of  exchange  edges  in  the  layouts  is  at  least  Sl(N/log,/2N). 
(Remember  that  although  the  maximum  total  overlap  of  exchange  edges  is  not  an 
upper  bound  on  the  number  of  horizontal  tracks  needed  to  insert  the  exchange 
edges,  it  is  a  lower  bound.) 

The  fi(i N/logI/2N)  lower  bound  on  maximum  overlap  is  easily  es  ablished  for 
the  layout  according  to  necklace  size  since  Sl(N/log,/2N)  exchange  edges  link 
nodes  of  size  k/2  to  nodes  of  size  k/2+1.  The  lower  bound  on  maximum  overlap 
is  somewhat  more  difficult  to  prove  for  the  layout  according  to  necklace  radius. 
The  first  step  in  the  proof  is  to  show  that  at  least  N/2  exchange  edges  are 
contained  within  a  square  of  side  length  ck,/2  centered  at  the  origin  of  the 
complex  plane  diagram  (where  c  is  a  constant).  (This  can  be  done  by  using  the 
techniques  developed  in  section  4c.)  Next  consider  the  sum  (over  i)  of  the  total 
overlaps  at  points  corresponding  to  radii  of  i/2  for  l<i<ck,/2.  Because  the 
complex  plane  diagram  is  radially  symmetric,  it  is  possible  to  show  that  at  least 
U{N)  exchange  edges  are  counted  in  this  sum.  Thus  the  overlap  at  one  of  these 
points  must  be  at  least  Sl(N/k,/2). 

Since  Thompson  [T80]  has  shown  that  any  layout  for  the  A-node  shuffle- 
exchange  graph  must  have  area  at  least  Q(N2/log2N),  we  know  that  at  least 
il(N/IogN)  horizontal  tracks  are  needed  to  insert  the  exchange  edges  for  any 
ordering  of  necklaces  in  the  level-necklace  grid.  However,  there  is  no  ordering  of 
the  necklaces  known  for  which  the  exchange  edges  can  be  inserted  using  less  than 
c(N/log,/2N)  horizontal  tracks.  This  suggests  an  interesting  open  question  since  it 
would  be  nice  to  find  an  0(N2/log2N)-arca  layout  based  on  the  complex  plane 
diagram  (Although  an  asymptotically  optimal  0(N2/log2N)-area  layout  for  the 
shuffle-exchange  graph  has  recently  been  found  by  Kleitman,  Leighton,  Lepley 
and  Miller  (KLLM81J,  it  is  rather  complicated  and  of  limited  practical  use.) 

Although  we  do  not  know  of  necklace  orderings  for  which  the  exchange  edges 
can  be  inserted  using  less  than  o{N/log,/2N)  horizontal  tracks,  we  do  know  of 
orderings  for  which  the  maximum  total  overlap  of  exchange  edges  is  at  most 


O (NloglogN/logN).  For  example,  an  ordering  of  the  necklaces  by  minimum  value 
has  a  maximum  total  overlap  of  Q(N/oglogN/logN).  (The  minimum  value  of  a 
necklace  is  simply  the  minimum  of  the  values  of  the  nodes  contained  in  the 
necklace.) 

Interestingly,  an  analysis  of  the  minimum  (over  all  orderings)  of  the  maximum 
total  overlap  for  small  values  of  N  indicates  that  there  may  always  be  an  ordering 
for  which  the  maximum  total  overlap  is  at  most  0(N/logN),  the  least  possible.  In 
fact,  for  3  <  N  <7,  this  minimum  maximum  overlap  is  precisely  [£2*  -  2)/k\.  A 
summary  of  the  minimum  maximum  overlap  data  for  small  values  of  N  is  included 
in  Table  1. 

Table  1 

Maximum  Overlap  of  Best  Known  Orderings 


maximum  overlap  of 


k 

N 

best  known  ordering 

optimal? 

3 

8 

2 

yes 

4 

16 

3 

yes 

5 

32 

6 

yes 

6 

64 

10 

yes 

7 

128 

18 

yes 

8 

256 

33 

yes 

9 

512 

62 

? 

• 

10 

1024 

115 

? 

11 

2048 

214 

? 

12 

4096 

388 

7 

13 

8192 

754 

? 

In  addition  to  varying  the  order  of  the  necklaces,  improvements  in  the  layout 
may  also  be  made  by  rearranging  the  level  assignments  of  the  exchange  edges.  For 
example,  the  layout  of  the  32-node  shuffle-exchange  graph  shown  in  Figure  7  was 
constructed  in  this  way.  (The  careful  reader  will  notice  that  we  have  also 


manipulated  the  necklaces  somewhat  in  order  to  produce  this  layout.)  For  a  more 
detailed  discussion  of  the  manner  in  which  exchange  edges  can  be  reassigned,  we 
refer  the  reader  to  [LM81].  (Such  layouts  have  also  been  used  in  conjunction  with 
the  Blue  Chip  Project  at  Purdue  [S81].) 
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Figure  7:  An  improved  layout  for  the  32-node  shuffle-exchange  graph. 
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